Back

PLOS Genetics

39 training papers 2019-06-25 – 2026-03-07

Top medRxiv preprints most likely to be published in this journal, ranked by match strength.

1
Explicitly modeling genetic ancestry to improve polygenic prediction accuracy for height in a large, admixed cohort of US Latinos: Findings from HCHS/SOL
2025-03-23 genetic and genomic medicine 10.1101/2025.03.21.25324423
#1 (11.3%)
Show abstract

Polygenic scores (PGS) offer moderate to high prediction accuracy for complex traits, but most are developed in European ancestry cohorts, reducing their performance in populations of other ancestries. This study aimed to improve standing height prediction, a heritable and ancestry-influenced trait, in an admixed Latino cohort (HCHS/SOL) by modeling ancestry using principal components (PCs) alongside PGS. SNPs were selected from a large European ancestry GWAS using various p-value thresholds, an...

2
TEMR: Trans-ethnic Mendelian Randomization Method using Large-scale GWAS Summary Datasets
2024-06-17 genetic and genomic medicine 10.1101/2024.06.16.24308874
#1 (8.7%)
Show abstract

Available large-scale GWAS summary datasets predominantly stem from European populations, while sample sizes for other ethnicities, notably Central/South Asian, East Asian, African, Hispanic, etc. remain comparatively limited, which induces the low precision of causal effect estimation within these ethnicities using Mendelian Randomization (MR). In this paper, we propose a Trans-ethnic MR method called TEMR to improve statistical power and estimation precision of MR in the target population usin...

3
A novel computational methodology for GWAS multi-locus analysis based on graph theory and machine learning
2021-10-26 epidemiology 10.1101/2021.10.22.21265388
#1 (8.3%)
Show abstract

BackgroundCurrent form of genome-wide association studies (GWAS) is inadequate to accurately explain the genetics of complex traits due to the lack of sufficient statistical power. It explores each variant individually, but current studies show that multiple variants with varying effect sizes actually act in a concerted way to develop a complex disease. To address this issue, we have developed an algorithmic framework that can effectively solve the multi-locus problem in GWAS with a very high le...

4
Impact of AKR1C2 and AKR1C3 single nucleotide polymorphism rs28571848 in adipose tissues of individuals with severe obesity
2025-10-14 genetic and genomic medicine 10.1101/2025.10.10.25337731
#1 (8.2%)
Show abstract

BackgroundAdipose tissue androgen turnover, dictated at least in part by the enzymes AKR1C2 and AKR1C3, has been linked to abdominal obesity. Recently, we investigated a single-nucleotide polymorphism (SNP) named rs28571858, that might increase AKR1C2 and AKR1C3 expression in human adipose tissue. Here, we studied the impact of rs28571848 on adipose tissue function and cardiometabolic health in bariatric surgery candidates. MethodsWe genotyped a sample of 2776 bariatric surgery candidates and r...

5
Overcome the Limitation of Phenome-Wide Association Studies (PheWAS): Extension of PheWAS to Efficient and Robust Large-Scale ICD Codes Analysis
2024-04-19 health informatics 10.1101/2024.04.15.24305098
#1 (7.9%)
Show abstract

The Phenome-wide association studies (PheWAS) have become widely used for efficient, high-throughput evaluation of relationship between a genetic factor and a large number of disease phenotypes, typically extracted from a DNA biobank linked with electronic medical records (EMR). Phecodes, billing code-derived disease case-control status, are usually used as outcome variables in PheWAS and logistic regression has been the standard choice of analysis method. Since the clinical diagnoses in EMR are...

6
Genomic summary statistics and meta-analysis for set-based gene-environment interaction tests in large-scale sequencing studies
2022-05-10 genetic and genomic medicine 10.1101/2022.05.08.22274819
#1 (7.8%)
Show abstract

We propose an efficient method to generate the summary statistics for set-based gene-environment interaction tests, as well as a meta-analysis approach that aggregates the summary statistics across different studies, which can be applied to large biobank-scale sequencing studies with related samples. Simulations showed that meta-analysis is numerically concordant with the equivalent pooled analysis using individual-level data. Moreover, meta-analysis accommodates heterogeneity between studies an...

7
Multi-organ genetic causal connections inferred from imaging and clinical data through Mendelian randomization
2023-05-28 genetic and genomic medicine 10.1101/2023.05.22.23290355
#1 (7.7%)
Show abstract

Understanding the complex causal relationships among major clinical outcomes and the causal interplay among multiple organs remains a significant challenge. By using imaging phenotypes, we can characterize the functional and structural architecture of major human organs. Mendelian randomization (MR) provides a valuable framework for inferring causality by leveraging genetic variants as instrumental variables. In this study, we conducted a systematic multi-organ MR analysis involving 402 imaging ...

8
Bayesian estimation of shared polygenicity identifies drug targets and repurposable medicines for human complex diseases
2025-03-17 genetic and genomic medicine 10.1101/2025.03.17.25324106
#1 (7.7%)
Show abstract

Complex diseases share heritable components which can be leveraged to identify drug targets with low side effect or high repurposing potential, but current methods cannot efficiently make these inferences at scale using public data. We introduce a Bayesian model to estimate the polygenic structure of a trait using GWAS summary data (BPACT). Across 32 complex traits, we estimated that 69.5 to 97.5% of disease-associated druggable genes are shared between multiple traits. We observed that targetin...

9
Polygenic risk vectors (PRV) improve genetic risk stratification for cardio-metabolic diseases
2022-03-11 genetic and genomic medicine 10.1101/2022.03.02.22271425
#1 (7.5%)
Show abstract

1.Accurate disease risk stratification can lead to more precise and personalized prevention and treatment of diseases. As an important component to disease risk, genetic risk factors can be utilized as an early and stable predictor for disease onset. Recently, the polygenic risk score (PRS) method has combined the effects from hundreds to millions of single nucleotide polymorphisms (SNPs) into a score that can be used for genetic risk stratification. However, current PRS approaches only utilize ...

10
Clustering Of Rare Variants For Causal Variants Identification And Effect Direction Classification
2024-02-23 genetic and genomic medicine 10.1101/2024.02.22.24303151
#1 (7.4%)
Show abstract

Several gene-based tests, e.g., sequence kernel association test, have been developed for association testing of rare single nucleotide variants (SNVs) in genomic regions with disease traits. A common limitation of these aggregate methods is their inability to discriminate potentially causal variants from null variants within the tested regions. We propose a novel clustering method to classify rare variants into null and signal variant groups using summary statistics from the gene-based tests ba...

11
Association of Multiple Trait Polygenic Risk Score with Obesity and Cardiometabolic Diseases in Korean population
2025-04-16 endocrinology 10.1101/2025.04.13.25325699
#1 (6.4%)
Show abstract

We conducted a comprehensive genetic investigation of obesity in a cohort of 93,673 Korean individuals, categorized by both body mass index and waist circumference using Korean-specific and international criteria. To explore the genetic architecture of obesity and its comorbidities, we performed genome-wide association studies and constructed polygenic risk scores (PRSs) using both conventional single trait and advanced multiple-trait models, including the PRSsum approach. Our analyses identifi...

12
Real-time dynamic polygenic prediction for streaming data
2024-07-14 genetic and genomic medicine 10.1101/2024.07.12.24310357
#1 (6.4%)
Show abstract

Polygenic risk scores (PRSs) are promising tools for advancing precision medicine. However, existing PRS construction methods rely on static summary statistics derived from genome-wide association studies (GWASs), which are often updated at lengthy intervals. As genetic data and health outcomes are continuously being generated at an ever-increasing pace, the current PRS training and deployment paradigm is suboptimal in maximizing the prediction accuracy of PRSs for incoming patients in healthcar...

13
Genetic variation affects morphological retinal phenotypes extracted from UK Biobank Optical Coherence Tomography images
2020-07-26 genetic and genomic medicine 10.1101/2020.07.20.20157180
#1 (6.4%)
Show abstract

Optical Coherence Tomography (OCT) enables non-invasive imaging of the retina and is often used to diagnose and manage multiple ophthalmic diseases including glaucoma. We present the first large-scale quantitative genome-wide association study of inner retinal morphology using phenotypes derived from OCT images of 31,434 UK Biobank participants. We identify 46 loci associated with thickness of the retinal nerve fibre layer or ganglion cell inner plexiform layer. Only one of these loci has previo...

14
The Great Genotyper: A Graph-Based Method for Population Genotyping of Small and Structural Variants
2024-07-05 genetic and genomic medicine 10.1101/2024.07.04.24309921
#1 (6.4%)
Show abstract

1Long-read sequencing (LRS) enables variant calling of high-quality structural variants (SVs). Genotypers of SVs utilize these precise call sets to increase the recall and precision of genotyping in short-read sequencing (SRS) samples. With the extensive growth in availabilty of SRS datasets in recent years, we should be able to calculate accurate population allele frequencies of SV. However, reprocessing hundreds of terabytes of raw SRS data to genotype new variants is impractical for populatio...

15
Genome-wide risk prediction of primary open-angle glaucoma across multiple ancestries
2023-11-08 genetic and genomic medicine 10.1101/2023.11.08.23298255
#1 (6.4%)
Show abstract

Withdrawal statementThis manuscript has been withdrawn by medRxiv following a formal request by the QIMR Berghofer Medical Research Institute Research Integrity Office owing to lack of author consent.

16
Robust Mixed Model Association Test for Gene-Environment Interactions
2025-10-03 genetic and genomic medicine 10.1101/2025.10.01.25336808
#1 (6.4%)
Show abstract

Linear mixed models (LMMs) are widely used in gene-environment interaction (GEI) studies to account for population structure and relatedness. However, genome-wide GEI tests using LMMs are computationally intensive, and model-based tests can yield inflated type I error rates when environmental main effects are misspecified. While robust inference methods exist for unrelated samples, challenges remain for related individuals. A common workaround is a two-step approach that first adjusts for relate...

17
Convex approaches to isolate the shared and distinct genetic structures of subphenotypes in heterogeneous complex traits
2025-04-16 genetic and genomic medicine 10.1101/2025.04.15.25325870
#1 (6.4%)
Show abstract

Groups of complex diseases, such as coronary heart diseases, neuropsychiatric disorders, and cancers, often display overlapping clinical symptoms and pharmacological treatments. The shared associations of genetic variants across diseases has the potential to explain their underlying biological processes, but this remains poorly understood. To address this, we model the matrix of summary statistics of trait-associated genetic variants as the sum of a low-rank component - representing shared biolo...

18
MaSk-LMM: A Matrix Sketching Framework for Linear Mixed Models in Association Studies
2023-11-13 genetic and genomic medicine 10.1101/2023.11.13.23298469
#1 (6.3%)
Show abstract

Linear mixed models (LMMs) have been widely used in genome-wide association studies (GWAS) to control for population stratification and cryptic relatedness. Unfortunately, estimating LMM parameters is computationally expensive, necessitating large-scale matrix operations to build the genetic relatedness matrix (GRM). Over the past 25 years, Randomized Linear Algebra has provided alternative approaches to such matrix operations by leveraging matrix sketching, which often results in provably accur...

19
Getting to GRIPS with MR-Egger: modelling directional pleiotropy independently of allele coding
2025-06-24 epidemiology 10.1101/2025.06.24.25330193
#1 (6.3%)
Show abstract

Mendelian Randomisation Egger regression (MR-Egger) is a popular method for causal inference using single-nucleotide polymorphisms (SNPs) as instrumental variables. It allows all SNPs to have direct pleiotropic effects on the outcome, provided that those effects are independent of the effects on the exposure, known as the InSIDE assumption. However, the results of MR-Egger, and the InSIDE assumption itself, are sensitive to which allele is coded as the effect allele for each SNP. A pragmatic con...

20
Proteome-wide Mendelian randomization implicates nephronectin as an actionable mediator of the effect of obesity on COVID-19 severity
2022-06-08 genetic and genomic medicine 10.1101/2022.06.06.22275997
#1 (6.3%)
Show abstract

Obesity is a major risk factor for COVID-19 severity; however, the mechanisms underlying this relationship are not fully understood. Since obesity influences the plasma proteome, we sought to identify circulating proteins mediating the effects of obesity on COVID-19 severity in humans. Here, we screened 4,907 plasma proteins to identify proteins influenced by body mass index (BMI) using Mendelian randomization (MR). This yielded 1,216 proteins, whose effect on COVID-19 severity was assessed, aga...